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Neural mechanism to simulate a scale-invariant future 
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Predicting future events, and their order, is important for efficient planning. We propose a neural 
mechanism to non-destructively translate the current state of memory into the future, so as to 
construct an ordered set of future predictions. This framework applies equally well to translations 
in time or in one-dimensional position. In a two-layer memory network that encodes the Laplace 
transform of the external input in real time, translation can be accomplished by modulating the 
weights between the layers. We propose that within each cycle of hippocampal theta oscillations, the 
memory state is swept through a range of translations to yield an ordered set of future predictions. 

We operationalize several neurobiological findings into phenomenological equations constraining 
translation. Combined with constraints based on physical principles requiring scale-invariance and 
coherence in translation across memory nodes, the proposition results in Weber-Fechner spacing 
for the representation of both past (memory) and future (prediction) timelines. The resulting 
expressions are consistent with findings from phase precession experiments in different regions of 
the hippocampus and reward systems in the ventral striatum. The model makes several experimental 
predictions that can be tested with existing technology. 


I. INTRODUCTION 

The brain encodes externally observed stimuli in real 
time and represents information about the current spatial 
location and temporal history of recent events as activity 
distributed over neural networks. Although we are physi¬ 
cally localized in space and time, it is often useful for us to 
make decisions based on non-local events, by anticipating 
events to occur at distant future and remote locations. 
Clearly, a flexible access to the current state of spatio- 
temporal memory is crucial for the brain to successfully 
anticipate events that might occur in the immediate next 
moment. In order to anticipate events that might occur 
in the future after a given time or at a given distance from 
the current location, the brain needs to simulate how the 
current state of spatio-temporal memory representation 
will have changed after waiting for a given amount of time 
or after moving through a given amount of distance. In 
this paper, we propose that the brain can swiftly and non- 
destructively perform space/time-translation operations 
on the memory state so as to anticipate events to occur 
at various future moments and/or remote locations. 

The rodent brain contains a rich and detailed repre¬ 
sentation of current spatial location and temporal his¬ 
tory. Some neurons-p/ace cells-hi the hippocampus fire 
in circumscribed locations within an environment, re¬ 
ferred to as their place fields. Early work excluded con¬ 
founds based on visual [T] or olfactory cues [2] , suggesting 
that the activity of place cells is a consequence of some 
form of path integration mechanism guided by the ani¬ 
mal’s velocity. Other neurons in the hippocampus —time 
cells —fire during a circumscribed period of time within 
a delay interval ISC]. By analogy to place cells, a set of 
time cells represents the animal’s current temporal posi¬ 
tion relative to past events. Some researchers have long 
hypothesized a deep connection between the hippocam¬ 
pal representations of place and time mm- 

Motivated by the spatial and temporal memory rep¬ 
resented in the hippocampus, we hypothesize that the 
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FIG. 1. a. Theta oscillations of 4-8 Hz are observed in the 
voltage recorded from the hippocampus. Hypothesis: Within 
a theta cycle, a timeline of future translations of magnitude S 
is constructed, b. Two layer network with theta-modulated 
connections. The t layer receives external input f in real time 
and encodes its Laplace transform. The Laplace transform is 
inverted via a synaptic operator to yield an estimate of the 
function f on the T layer nodes. By periodically manipulating 
the weights in the memory state represented in T layer 
can be translated to represent its future states. 


translation operation required to anticipate the events at 
a distant future engages this part of the brain [Mill]. We 
hypothesize that theta oscillations, a well-characterized 
rhythm of 4-8 Hz in the local field potential observed in 
the hippocampus may be responsible for the translation 
operation. In particular, we hypothesize that sequential 
translations of different magnitudes take place at differ¬ 
ent phases within a cycle of theta oscillation, such that 
a timeline of anticipated future events (or equivalently 
a spaceline of anticipated events at distant locations) is 
swept out in a single cycle (fig. Ef)- 
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Theta oscillations are prominently observed during pe¬ 
riods of navigation [12]. Critically, there is a systematic 
relationship between the animal’s position within a neu¬ 
ron’s place field and the phase of the theta oscillation at 
which that neuron fires m, known as phase precession. 
This suggests that the phase of firing of the place cells 
conveys information about the anticipated future loca¬ 
tion of the animal. This provides a strong motivation for 
our hypothesis that the phase of theta oscillation would 
be linked to the translation operation. 


A. Overview 

This paper develops a computational mechanism for 
the translation operation of a spatial/temporal memory 
representation constructed from a two-layer neural net¬ 
work model mi, and links it to theta oscillations by im¬ 
posing certain constraints based on some neurophysiolog¬ 
ical observations and some physical principles we expect 
the brain to satisfy. Since the focus here is to understand 
the computational mechanism of a higher level cognitive 
phenomena, the imposed constraints and the resulting 
derivation should be viewed at a phenomenological level, 
and not as emerging from biophysically detailed neural 
interactions. 

Computationally, we assume that the memory repre¬ 
sentation is constructed by a two-layer network (fig. Ef) 
where the first layer encodes the Laplace transform of 
externally observed stimuli in real time, and the second 
layer approximately inverts the Laplace transform to rep¬ 
resent a fuzzy estimate of the actual stimulus history 
m- With access to instantaneous velocity of motion, 
this two layer network representing temporal memory 
can be straightforwardly generalized to represent one¬ 
dimensional spatial memory m- Hence in the context 
of this two layer network, time-translation of the tem¬ 
poral memory representation can be considered mathe¬ 
matically equivalent to space-translation of the spatial 
memory representation. 

Based on a simple, yet powerful, mathematical obser¬ 
vation that translation operation can be performed in the 
Laplace domain as an instantaneous point-wise product, 
we propose that the translation operation is achieved by 
modulating the connection weights between the two lay¬ 
ers within each theta cycle (fig. Ef)- The translated rep¬ 
resentations can then be used to predict events at distant 
future and remote locations. In constructing the trans¬ 
lation operation, we impose two physical principles we 
expect the brain to satisfy. The first principle is scale- 
invariance^ the requirement that all scales (temporal or 
spatial) represented in the memory are treated equally in 
implementing the translation. The second principle is co¬ 
herence, the requirement that at any moment all nodes 
forming the memory representation are in sync, trans¬ 
lated by the same amount. 

Further, to implement the computational mechanism 
of translation as a neural mechanism, we impose certain 


phenomenological constraints based on neurophysiologi¬ 
cal observations. First, there exists a dorsoventral axis 
in the hippocampus of a rat’s brain, and the size of place 
fields increase systematically from the dorsal to the ven¬ 
tral end [mui]. In light of this observation, we hypoth¬ 
esize that the nodes representing different temporal and 
spatial scales of memory are ordered along the dorsoven¬ 
tral axis. Second, the phase of theta oscillation is not 
uniform along the dorsoventral axis; phase advances from 
the dorsal to the ventral end like a traveling wave [181HH] 
with a phase difference of about tt from one end to the 
other. Third, the synaptic weights change as a function of 
phase of the theta oscillation throughout the hippocam¬ 
pus Eoiin]- In light of this observation, we hypothesize 
that the change in the connection strengths between the 
two layers required to implement the translation opera¬ 
tion depend only on the local phase of the theta oscilla¬ 
tion at any node (neuron). 

In section [Til we impose the above mentioned physi¬ 
cal principles and phenomenological constraints to derive 
quantitative relationships for the distribution of scales of 
the nodes representing the memory and the theta-phase 
dependence of the translation operation. This yields spe¬ 
cific forms of phase-precession in the nodes representing 
the memory as well as the nodes representing future pre¬ 
diction. Section [hi] compares these forms to neurophysio¬ 
logical phase precession observed in the hippocampus and 
ventral striatum. Section |m] also makes explicit neuro¬ 
physiological predictions that could verify our hypothesis 
that theta oscillations implement the translation opera¬ 
tion to construct a timeline of future predictions. 


II. MATHEMATICAL MODEL 

In this section we start with a basic overview of the two 
layer memory model and summarize the relevant details 
from previous work [alia [22] to serve as a background. 
Following that, we derive the equations that allow the 
memory nodes to be coherently time-translated to var¬ 
ious future moments in synchrony with the theta oscil¬ 
lations. Finally we derive the predictions generated for 
various future moments from the time-translated mem¬ 
ory states. 


A. Theoretical background 

The memory model is implemented as a two-layer feed¬ 
forward network (fig. ED where the t layer holds the 
Laplace transform of the recent past and the T layer 
reconstructs a temporally fuzzy estimate of past events 
M 122]. Let the stimulus at any time r be denoted 
as f(r). The nodes in the t layer are leaky integrators 
parametrized by their decay rate s, and are all indepen¬ 
dently activated by the stimulus. The nodes are assumed 
to be arranged w.r.t. their s values. The nodes in the T 
layer are in one to one correspondence with the nodes in 
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the t layer and hence can also be parametrized by the 
same s. The feedforward connections from the t layer 
into the T layer are prescribed to satisfy certain math¬ 
ematical properties which are described below. The ac¬ 
tivity of the two layers is given by 


—t (r, s) = -st (r, s) + f (t) 

(1) 

T(T,s) = [L-ki]t(T,s) 

(2) 


By integrating eq. note that the t layer encodes the 
Laplace transform of the entire past of the stimulus func¬ 
tion leading up to the present. The s values distributed 
over the t layer represent the (real) Laplace domain vari¬ 
able. The fixed connections between the t layer and 
T layer denoted by the operator (in eq. |^, is con¬ 
structed to reflect an approximation to inverse Laplace 
transform. In effect, the Laplace transformed stimulus 
history which is distributed over the t layer nodes is in¬ 
verted by Ij~^ such that a fuzzy (or coarse grained) esti¬ 
mate of the actual stimulus value from various past mo¬ 
ments is represented along the different T layer nodes. 

More precisely, treating the s values nodes as continu¬ 
ous, the operator can be succinctly expressed as 

^ [L-,i]t(r,5) (3) 

Here (r, s) corresponds to the k-th derivative of t (r, s) 
w.r.t. s. It can be proven that Ij~^ operator executes 
an approximation to the inverse Laplace transformation 
and the approximation grows more and more accurate 
for larger and larger values of k [23]. Further details of 
Ij~^ depends on the s values chosen for the nodes [22| , but 
these details are not relevant for this paper as the s values 
of neighboring nodes are assumed to be close enough that 
the analytic expression for given by eq. would be 
accurate. 

To emphasize the properties of this memory represen¬ 
tation, consider the stimulus f(r) to be a Dirac delta 
function at r = 0. From eq. [^and[^ the T layer activity 
following the stimulus presentation (r > 0) turns out to 
be 

= (4) 

Note that nodes with different s values in the T layer 
peak in activity after different delays following the stim¬ 
ulus; hence the T layer nodes behave like time cells. In 
particular, a node with a given s peaks in activity at a 
time T = kjs following the stimulus. Moreover, viewing 
the activity of any node as a distribution around its ap¬ 
propriate peak-time (k/s)^ we see that the shape of this 
distribution is exactly the same for all nodes to the extent 
r is rescaled to align the peaks of all the nodes. In other 
words, the activity of different nodes of the T layer rep¬ 
resent a fuzzy estimate of the past information from dif¬ 
ferent timescales and the fuzziness associated with them 
is directly proportional to the timescale they represent. 


while maintaining the exact same shape of fuzziness. For 
this reason, the T layer represents the past information 
in a scale-invariant fashion. 

This two-layer memory architecture is also amenable 
to represent one-dimensional spatial memory analogous 
to the representation of temporal memory in the T layer 
m- If the stimulus f is interpreted as a landmark en¬ 
countered at a particular location in a one-dimensional 
spatial arena, then the t layer nodes can be made to rep¬ 
resent the Laplace transform of the landmark treated as 
a spatial function with respect to the current location. 
By modifying eq. to 

■^t{T,s) =v[-St{T,s)+{{T)], (5) 

where v is the velocity of motion, the temporal depen¬ 
dence of the t layer activity can be converted to spa¬ 
tial dependence 0 By employing the operator on this 
modified t layer activity (eq. [^ , it is straightforward to 
construct a layer of nodes (analogous to T) that exhibit 
peak activity at different distances from the landmark. 
Thus the two-layer memory architecture can be trivially 
extended to yield place-cells in one dimension. 

In what follows, rather than referring to translation 
operations separately on spatial and temporal memory, 
we shall simply consider time-translations with an im¬ 
plicit understanding that all the results derived can be 
trivially extended to 1-d spatial memory representations. 


B. Time-translating the Memory state 

The two-layer architecture naturally lends itself for 
time-translations of the memory state in the T layer, 
which we shall later exploit to construct a timeline of 
future predictions. The basic idea is that if the current 
state of memory represented in the T layer is used to 
anticipate the present (via some prediction mechanism), 
then a time-translated state of T layer can be used to pre¬ 
dict events that will occur at a distant future via the same 
prediction mechanism. Time-translation means to mimic 
the T layer activity at a distant future based on its cur¬ 
rent state. Ideally translation should be non-destructive, 
not overwriting the current activity in the t layer. 

Let S be the amount by which we intend to time- 
translate the state of T layer. So, at any time r, the 
aim is to access T(r + (5, s) while still preserving the cur¬ 
rent t layer activity, t(r, 5 ). This is can be easily achieved 
because the t layer represents the stimulus history in the 
Laplace domain. Noting that the Laplace transform of a 
(^-translated function is simply the product of and 


^ Theoretically, the velocity here could be an animal’s running 
velocity in the lab maze or a mentally simulated human motion 
while playing video games. 
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FIG. 2. Traveling theta wave along the s axis. The x-axis is 
real time. Each point along the dorsoventral axis corresponds 
to a different value of Sn- The curvy blue lines show the 
theta oscillation for several different values of s. Lines 1 and 2 
connect the positions where the local phases Og are 0 and tt 
respectively. 

the Laplace transform of the un-translated function, we 
see that 

t(r + (5, s) = s) (6) 

Now noting that T(r+(5, s) can be obtained by employing 
the operator on t(r + (5, s) analogous to eq. ^ we 
obtain the (^-translated T activity as 

Ts{T,s) = T{T + d,s) = [L],^]t{T + d,s) 

= [L^Vr,] t(r,s) (7) 

where Us is just a diagonal operator whose rows and 
columns are indexed by s and the diagonal entries are 
. The ^-translated activity of the T layer is now 
subscripted by 6 as Ts so as to distinguish it from the 
un-translated T layer activity given by eq. without 
a subscript. In this notation the un-translated state 
T(r, s) from eq.j^can be expressed as To(r, s). The time- 
translated T activity can be obtained from the current t 
layer activity if the connection weights between the two 
layers given by Ij~^ is modulated by Us- This computa¬ 
tional mechanism of time-translation can be implemented 
as a neural mechanism in the brain, by imposing certain 
phenomenological constraints and physical principles. 

Observation 1: Anatomically, along the dorsoventral 
axis of the hippocampus, the width of place fields system¬ 
atically increases from the dorsal end to the ventral end 
nani]. Fig. i schematically illustrates this observation 
by identifying the s-axis of the two-layer memory archi¬ 
tecture with the dorso-ventral axis of the hippocampus, 
such that the scales represented by the nodes are mono- 
tonically arranged. Let there be A" +1 nodes with mono- 
tonically decreasing s values given by So, si, ... sn- 
Observation 2: The phase of the theta oscillations 
along the axis is non-uniform, representing a traveling 
wave from the dorsal to ventral part of the hippocampus 
with a net phase shift of tt [iHi uni- The oscillations 


in fig. symbolize the local field potentials at different 
locations of the 5-axis. The local phase of the oscillation 
at any position on the 5-axis is denoted by Og, which 
ranges from — tt to -\-7r by convention. However, as a 
reference we denote the phase at the top (dorsal) end as 
Oo ranging from 0 to 27r, with the understanding that 
the range (tt, 27r) is mapped on to (—7r,0). The x-axis 
in fig. is time within a theta oscillation labeled by the 
phase Oo. 

In this convention, the value of Og discontinuously 
jumps from +7r to — tt as we move from one cycle of oscil¬ 
lation to the next. In fig. the diagonal (solid-red) line 
labeled ‘2’ denotes all the points where this discontinu¬ 
ous jump happens. The diagonal (dashed) line labeled 
‘1’ denotes all the points where = 0. It is straight¬ 
forward to infer the relationship between the phase at 
any two values of 5. Taking the nodes to be uniformly 
spaced anatomically, the local phase Og of the n-th node 
is related to Oq (for 0 < < tt) b}0 

Og/iv = Ool'K - n/N. (8) 

Observation 3: Synaptic weights in the hippocam¬ 
pus are modulated periodically in synchrony with the 
phase of theta oscillation [20l |2T]. Based on this ob¬ 
servation, we impose the constraint that the connection 
strengths between the t and T layers at a particular value 
of 5 depend only on the local phase of the theta oscil¬ 
lations. Thus the diagonal entries in the operator 
should only depend on Og. We take these entries to be 
of the form exp {—^g{0g)), where is any continuous 
function of Og G (—7r,+7r). Heuristically, at any moment 
within a theta cycle, a T node with a given 5 value will 
be roughly translated by an amount S = ^g{0g)/s. 

Principle 1: Preserve Seale-Invarianee 

Scale-invariance is an extremely adaptive property for 
a memory to have; in many cases biological memories 
seem to exhibit scale-invariance m- As the untrans¬ 
lated T layer activity already exhibits scale-invariance, 
we impose the constraint that the time-translated states 
of T should also exhibit scale-invariance. This consid¬ 
eration requires the behavior of every node to follow the 
same pattern with respect to their local theta phase. This 
amounts to choosing the functions to be the same for 
all 5, which we shall refer to as 

Principle 2: Coherenee in translation 

Since the time-translated memory state is going to be 
used to make predictions for various moments in the 
distant future, it would be preferable if all the nodes 
are time-translated by the same amount at any moment 
within a theta cycle. If not, different nodes would con¬ 
tribute to predictions for different future moments lead¬ 
ing to noise in the prediction. However, such a require¬ 
ment of global coherence cannot be imposed consistently 


^ Since the s values of the nodes are monotonically arranged, we 
can interchangeably use s or n as subscritpts to 0. 
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along with the principle 1 of preserving scale-invariance 
But in the light of prior work [25l |26] which suggest that 
retrieval of memory or prediction happens only in one 
half of the theta cycle Q we impose the requirement of 
coherence only to those nodes that are all in the positive 
half of the cycle at any moment. That is, S = ^(Os)/s is 
a constant along any vertical line in the region bounded 
between the diagonal lines 1 and 2 shown in fig. Hence 
for all nodes with 0 < 0^ < tt, we require 

A (0,) /s) = A {So - ttu/N) /Sn) = 0. (9) 

For coherence as expressed in eq. to hold at all val¬ 
ues of Oo between 0 and 27r, ^{Og) must be an exponential 
function so that Oq can be functionally decoupled from 
n; consequently Sn should also have an exponential de¬ 
pendence on n. So the general solution to eq. when 
0 < < TT can be written as 

^(6>s) = ^oCxp [bOs] (10) 

5n=5o(l+c)“^ (11) 

where c is a positive number. In this paper, we shall take 
c 1, so that the analytic approximation for the 
operator given in terms of the k-th. derivative along the 
s axis in eq. |^is valid. 

Thus the requirement of coherence in time-translation 
implies that the s values of the nodes—the timescales 
represented by the nodes—are spaced out exponentially, 
which can be referred to as a Weber-Fechner scale, a com¬ 
monly used terminology in cognitive science. Remark¬ 
ably, this result strongly resonates with a requirement 
of the exact same scaling when the predictive informa¬ 
tion contained in the memory system is maximized in re¬ 
sponse to long-range correlated signals [22] . This feature 
allows this memory system to represent scale-invariantly 
coarse grained past information from timescales exponen¬ 
tially related to the number of nodes. 

The maximum value attained by the function ^{Og) 
is at 6>s = TT, and the maximum value is ^max = 
$oCxp[67r], such that ^max/^o = So/sjsf and b = 
(I/tt) log (^rnax/^o)- To cnsurc continuity around Og = 
0, we take the eq. [^to hold true even for Og G (—tt, 0). 
However, since notationally Og makes a jump from +7r to 
— TT, ^{Og) would exhibit a discontinuity at the diagonal 
line 2 in fig. from ^max (corresponding to Og = it) to 
^min = ^o/^max (corresponding to 6»s = -tt). 


^ This is easily seen by noting that each node will have a maxi¬ 
mum translation inversely proportional to its s-value to satisfy 
principle 1. 

This hypothesis follows from the observation that while both 
synaptic transmission and synaptic plasticity are modulated by 
theta phase, they are out of phase with one another. That is, 
while certain synapses are learning, they are not communicating 
information and vice versa. This led to the hypothesis that the 
phases where plasticity is optimal are specialized for encoding 
whereas the phases where transmission is optimal are specialized 
for retrieval. 


Given these considerations, at any instant within a 
theta cycle, referenced by the phase 6>o, the amount 6 
by which the memory state is time-translated can be de¬ 
rived from eq. and as 

6{0o) = {^o/so)ex.-p[b0o]. ( 12 ) 

Analogous to having the past represented on a Weber- 
Fechner scale, the translation distance 6 into the future 
also falls on a Weber-Fechner scale as the theta phase 
is swept from 0 to 27r. In other words, the amount of 
time spent within a theta cycle for larger translations is 
exponentially smaller. 

To emphasize the properties of the time-translated T 
state, consider the stimulus to be a Dirac delta function 
at r = 0. From eq. we can express the T layer activity 
analogous to eq. 

Ts{t, s):^^ [st + ^ ( 0 ,)]" ( 13 ) 

Notice that eqs. and specify a unique relationship 
between 6 and Og for any given s. The r.h.s. above is 
expressed in terms of Og rather than S so as to shed light 
on the phenomenon of phase precession. 

Since Ts{r,s) depends on both r and Og only via the 
sum [sr -j- ^ {Og)]^ a given node will show identical activ¬ 
ity for various combinations of r and For instance, 
a node would achieve its peak activity when r is signifi¬ 
cantly smaller than its timescale (k/s) only when ^{Og) 
is large—meaning Og +7r. And as r increases towards 
the timescale of the node, the peak activity gradually 
shifts to earlier phases all the way to Og — tt. An im¬ 
portant consequence of imposing principle 1 is that the 
relationship between Og and r on any iso-activity contour 
is scale-invariant. That is, every node behaves similarly 
when r is rescaled by the timescale of the node. We shall 
further pursue the analogy of this phenomenon of phase 
precession with neurophysiological findings in the next 
section (fig. |^. 


C. Timeline of Future Prediction 


At any moment, Ts (eq. 13) can be used to predict the 
stimuli expected at a future moment. Consequently, as S 
is swept through within a theta cycle, a timeline of future 
predictions can be simulated in an orderly fashion, such 
that predictions for closer events occur at earlier phases 
(smaller Oq) and predictions of distant events occur at 
later phases. In order to predict from a time-translated 
state T< 5 , we need a prediction mechanism. For our pur¬ 
poses, we consider here a very simple form of learning and 


^ While representing timescales much larger than the period of a 
theta cycle, r can essentially be treated as a constant within a 
single cycle. In other words, Og and r in eq. [^can be treated as 
independent, although in reality the phases evolve in real time. 
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prediction, Hebhian association. In this view, an event is 
learned (or an association formed in long term memory) 
by increasing the connection strengths between the neu¬ 
rons representing the currently-experienced stimulus and 
the neurons representing the recent past events (Tq). Be¬ 
cause the T layer activity contains temporal information 
about the preceding stimuli, simple associations between 
To and the current stimulus are sufficient to encode and 
express well-timed predictions El- In particular, the 
term Hebbian implies that the change in each connection 
strength is proportional to the product of pre-synaptic 
activity—in this case the activity of the corresponding 
node in the T layer—and post-synaptic activity corre¬ 
sponding to the current stimulus. Given that the associ¬ 
ations are learned in this way, we define the prediction of 
a particular stimulus to be the scalar product of its asso¬ 
ciation strengths with the current state of T. In this way, 
the scalar product of association strengths and a trans¬ 
lated state Ts can be understood as the future prediction 
of that stimulus. 

Consider the thought experiment where a conditioned 
stimulus CS is consistently followed by another stimulus, 
A or B, after a time Tq. Later when CS is repeated (at a 
time T = 0), the subsequent activity in the T nodes can 
be used to generate predictions for the future occurrence 
of A or B. The connections to the node corresponding 
to A will be incremented by the state of Tq when A is 
presented; the connections to the node corresponding to 
B will be incremented by the state of Tq when B is pre¬ 
sented. In the context of Hebbian learning, the prediction 
for the stimulus at a future time as a function of r and 
To is obtained as the sum of Ts activity of each node 
multiplied by the learned association strength (Tq): 

N 

Ps{t,To) = '^Ts{T,Sn) To {To, Sn) /s'^. (14) 

n=£ 


The factor (for any w) allows for differential associ¬ 
ation strengths for the different s nodes, while still pre¬ 
serving the scale invariance property. Since S and Oq are 
monotonically related (eq. , the prediction for vari¬ 
ous future moments happens at various phases of a theta 
cycle. 

Recall that all the nodes in the T layer are coherently 
time-translated only in the positive half of the theta cy¬ 
cle. Hence for computing future predictions based on 
a time-translated state T< 5 , only coherent nodes should 
contribute. In fig. the region to the right of diagonal 
line 2 does not contribute to the prediction. The lower 
limit I in the summation over the nodes given in eq. 
is the position of the diagonal line 2 in fig. [^marking the 
position of discontinuity where Og jumps from +7r to —tt. 

In the limit when c ^ 0, the s values of neighboring 
nodes are very close and the summation can be approx¬ 
imated by an integral. Defining x = sTq and y = t/to 
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FIG. 3. Future timeline. Eq. [^is plotted as a function of 5. 
During training, the CS was presented at To = 3 before A and 
To — 1 before B. Left: Immediately after presentation of the 
CS, the predictions for A and B are ordered on the 5 axis. Note 
that the prediction for B approximates a rescaled version of 
that for A. Right: The prediction for B is shown for varying 
times after presentation of CS. With the passage of time, the 
prediction of B becomes stronger and more imminent. In this 
figure, $max = 10, = 1, A; = 10, So = 10, sn = 1, and 

le = 1. 


and V = 5/To ^ the above summation can be rewritten as 


P5{t,To) ~ 


^w-2 px 

L 


^ 2 k+i-w(^y + dx 

(15) 


Here Xmin = and Xu = SqTo for 0 < 6 >o < 'tt and 

Xu = ^maxXo/^ foi* TT < Oq < ^TT. The integral can be 
evaluated in terms of lower incomplete gamma functions 
to be 


P6{t,To) ~ 


2 




fc!2 [l + {T + 5)/Tof 
(r [C, {To+T + (5)[/] - r [C iTo + T + 5)Siv]) , (16) 


where C = 2k ^2 — w and r[., .] is the lower incomplete 
gamma function. For Oq < tv (i.e., when S < ^max/«5o)5 
U = So and for do > tv (i.e., when b > ^max/«5o)5 U = 

^max/ 

Figure provides a graphical representation of some 
key properties of eq. The figure assumes that the 
CS is followed by A after = 3 and followed by B after 
To = 7. The left panel shows the predictions for both A 
and B as a function of 5 immediately after presentation 
of CS. The prediction for A appears at smaller d and with 
a higher peak than the prediction for B. The value of w 
affects the relative sizes of the peaks. The right panel 
shows how the prediction for B changes with the passage 
of time after presentation of the CS. As r increases from 
zero and the CS recedes into the past, the prediction of 
B peaks at smaller values of S, corresponding to more 
imminent future times. In particular when is much 
smaller than the largest (and larger than the smallest) 
timescale represented by the nodes, then the shape of 
remains the same when S and r are rescaled by r^. 
Under these conditions, the timeline of future predictions 
generated by is scale-invariant. 
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Since S is in one-to-one relationship with 6>o, as a pre¬ 
dicted stimulus becomes more imminent, the activity cor¬ 
responding to that predicted stimulus should peak at ear¬ 
lier and earlier phases. Hence a timeline of future pre¬ 
dictions can be constructed from ps as the phase Oq is 
swept from 0 to 2it. Moreover the cells representing ps 
should show phase precession with respect to Oq. Un¬ 
like cells representing T< 5 , which depend directly on their 
local theta phase, 6>s, the phase precession of cells rep¬ 
resenting Ps should depend on the reference phase Oq at 
the dorsal end of the 5-axis. We shall further connect 
this neurophysiology in the next section (fig. [^. 


III. COMPARISONS WITH 
NEUROPHYSIOLOGY 


a b 
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FIG. 4. a. Neurophysiological data showing phase preces¬ 
sion. Each spike fired by a place cell is shown as a function of 
its position along a linear track (x-axis) and the phase of local 
theta (y-axis). After Mehta, et al., 2002. b. Simulated spikes 
from a node in the T layer described by eq. as a function 
of T and local phase Os- The curvature is a consequence of 
eq. See text for details. 


The mathematical development focused on two entities 
and that change their value based on the theta 
phase (eqs. and 16). In order to compare these to 
neurophysiology, we need to have some hypothesis link¬ 
ing them to the activity of neurons from specific brain 
regions. We emphasize that although the development 
in the preceding section was done with respect to time, 
all of the results generalize to one-dimensional position 
as well (eq. [II])- The overwhelming majority of ev¬ 
idence for phase precession comes from studies of place 
cells (but see |3|). Here we compare the properties of 
Ts to phase precession in hippocampal neurons and the 
properties of p <5 to a study showing phase precession in 
ventral striatum 

Due to various analytic approximations, the activity 
of nodes in the T layer as well as the activity of the 
nodes representing future prediction (eqs. 13 and[^ are 
expressed as smooth functions of time and theta phase. 
However, neurophysiologically, discrete spikes (action po¬ 
tentials) are observed. In order to facilitate compari¬ 
son of the model to neurophysiology, we adopt a simple 
stochastic spike-generating method. In this simplistic ap¬ 
proach, the activity of the nodes given by eqs. IM and[T6| 
are taken to be proportional to the instantaneous proba¬ 
bility for generating a spike. The probability of generat¬ 
ing a spike at any instant is taken to be the instantaneous 
activity divided by the maximum activity achieved by 
the node if the activity is greater than 60% of the maxi¬ 
mum activity. In addition, we add spontaneous stochas¬ 
tic spikes at any moment with a probability of 0.05. For 
all of the figures in this section, the parameters of the 
model are set as k = 10, <l>max = 10, re = 2, <l>o = 1, 


Sn = 1, 5o = 10. 

This relatively coarse level of realism in spike gener¬ 
ation from the analytic expressions is probably appro¬ 
priate to the resolution of the experimental data. There 


This is not meant to preclude the possibility that could be 
computed at other parts of the brain as well. 


are some experimental challenges associated with exactly 
evaluating the model. First, theta phase has to be es¬ 
timated from a noisy signal. Second, phase precession 
results are typically shown as averaged across many tri¬ 
als. It is not necessarily the case that the average is 
representative of an individual trial (although this is the 
case at least for phase-precessing cells in medial entorhi- 
nal cortex [28]). Finally, the overwhelming majority of 
phase precession experiments utilize extracellular meth¬ 
ods, which cannot perfectly identify spikes from individ¬ 
ual neurons. 


A. Hippocampal phase precession 


It is clear from eq. [^that the activity of nodes in the 
T layer depends on both Og and r. Figure shows phase 
precession data from a representative cell (Fig. Ef, Eni) 
and spikes generated from eq. (Fig. &)• The model 
generates a characteristic curvature for phase precession, 
a consequence of the exponential form of the function ^ 
(eq. [To]). The example cell chosen in fig.shows roughly 
the same form of curvature as that generated by the 
model. While it should be noted that there is some vari¬ 
ability across cells, careful analyses have led computa¬ 
tional neuroscientists to conclude that the canonical form 
of phase precession resembles this representative cell. For 
instance, a detailed study of hundreds of phase-precessing 
neurons m constructed averaged phase-precession plots 
using a variety of methods and found a distinct curva¬ 
ture that qualitatively resembles this neuron. Because 
of the analogy between time and one-dimensional posi¬ 
tion (eq. , the model yields the same pattern of phase 
precession for time cells and place cells. 

The T layer activity represented in fig. is scale- 
invariant; note that the x-axis is expressed in units of 
the scale of the node {k/s). It is known that the spa¬ 
tial scale of place fields changes systematically along the 
dorsoventral axis of the hippocampus. Place cells in the 
dorsal hippocampus have place fields of the order of a few 
centimeters whereas place cells at the ventral end have 
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FIG. 5. Place cells along the dorsoventral axis of the hip¬ 
pocampus have place fields that increase in size. a. The three 
panels show the activity of place cells recorded at the dorsal, 
intermediate and ventral segments of the hippocampus, when 
a rat runs along an 18 m track. After Kjelstrup, et ah, (2008). 
Each spike the cell fired is shown as a function of position and 
the local theta phase at the cell’s location when it fires (recall 
that theta phase is not constant across the dorsoventral axis). 
Regardless of the width of the place field, neurons at all lo¬ 
cations along the dorsoventral axis phase precess through the 
same range of local theta phases, b. According to the model, 
phase precession extends over the same range of values of local 
theta Os regardless of the value of s, which sets the scale for a 
particular node. As a consequence, cells with different values 
of s show time/place fields of different size but phase precess 
over the same range of local theta. For the three figures, s 
values of the nodes are set to .1, .22, and .7 respectively, and 
they are assumed to respond to landmarks at location 4, 11, 
and 3 meters respectively from one end of the track. 


place fields as large as a few meters (fig. Hf) da EH- 
However, all of them show the same pattern of preces¬ 
sion with respect to their local theta phase—the phase 
measured at the same electrode that records a given place 
cell (fig. [^. Recall that at any given moment, the local 
phase of theta oscillation depends on the position along 
the dorsoventral axis [HI |T9], denoted as the s-axis in 
the model. 

Figure IT shows the activity of three different place 
cells in an experiment where rats ran down a long track 
that extended through open doors connecting three test¬ 
ing rooms m- The landmarks controlling a particular 
place cell’s firing may have been at a variety of locations 
along the track. Accordingly, fig. IT shows the activity of 
cells generated from the model with different values of s 
and with landmarks at various locations along the track 
(described in the caption). From fig. it can be qualita¬ 
tively noted that phase precession of different cells only 
depends on the local theta phase and is unaffected by 
the spatial scale of firing. This observation is perfectly 
consistent with the model. 



FIG. 6. a. A representative ramping cell in the ventral 
striatum. On each trial the animal started the maze at S, 
made a series of turns (Tl, T2, etc) and received reward at 
FI on 75 percent of trials. The total distance between S and 
FI is on the order of a few meters. Position along the track 
is represented linearly on the x-axis for convenience. In the 
top panel, the spikes are shown as a function of theta phase 
at the dorsal hippocampus and position. The bottom panel 
shows the firing rate as a function of position, which is seen 
to gradually ramp up towards the reward location, b. The 
activity of prediction node generated by the model is plotted 
w.r.t. the reference phase Oo and position in the top panel, 
and the the average activity within a theta cycle is plotted 
against position in the bottom panel. 


B. Prediction of distant rewards via phase 
precession in the ventral striatum 

We compare the future predictions generated by the 
model (eq. [T^ to an experiment that recorded simulta¬ 
neously from the hippocampus and nucleus accumbens, a 
reward-related structure within the ventral striatum EH- 
Here the rat’s task was to learn to make several turns in 
sequence on a maze to reach two locations where reward 
was available. Striatal neurons fired over long stretches 
of the maze, gradually ramping up their firing as a func¬ 
tion of distance along the path and terminating at the 
reward locations (bottom fig. IT)- Many striatal neu¬ 
rons showed robust phase precession relative to the theta 
phase at the dorsal hippocampus (top fig. |^). Remark¬ 
ably, the phase of oscillation in the hippocampus con¬ 
trolled firing in the ventral striatum to a greater extent 
than the phase recorded from within the ventral stria¬ 
tum. On trials where there was not a reward at the 
expected location (FI), there was another ramp up to 
the secondary reward location (F2), accompanied again 
by phase precession (not shown in fig. [^). 

This experiment corresponds reasonably well to the 
conditions assumed in the derivation of eq. In this 
analogy, the start of the trial (start location S) plays the 
role of the CS and the reward plays the role of the pre¬ 
dicted stimulus. However, there is a discrepancy between 
the methods and the assumptions of the derivation. The 
ramping cell (fig. IT) abruptly terminates after the re¬ 
ward is consumed, whereas eq.[^ would gradually decay 
back towards zero. This is because of the way the experi¬ 
ment was set up-there were never two rewards presented 
consecutively. As a consequence, having just received a 
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FIG. 7. Changing To affects the phase at which prediction 
cells start firing. At early times, the magnitude of translation 
required to predict the To = 3 outcome is smaller than that 
required to predict the To = 7 outcome. Consequently, the 
cell begins to fire at a larger Oo for Tq — 7. Parameter values 
are the same as the other figures as given in the beginning of 
this section, except for clarity the background probability of 
spiking has been set to zero. 


reward strongly predicts that there will not be a reward 
in the next few moments. In light of this consideration, 
we force the prediction generated in eq. to be zero 
beyond the reward location and let the firing be purely 
stochastic. The top panel of fig. if shows the spikes 
generated by model prediction cells with respect to the 
reference theta phase and the bottom panel shows the 
ramping activity computed as the average firing activity 
within a complete theta cycle around any moment. 

The model correctly captures the qualitative pattern 
observed in the data. According to the model, the reward 
starts being predicted at the beginning of the track. Ini¬ 
tially, the reward is far in the future, corresponding to a 
large value of (5. As the animal approaches the location 
of the reward, the reward moves closer to the present 
along the S axis, reaching zero near the reward location. 
The ramping activity is a consequence of the exponential 
mapping between S and Oo in eq. Since the proportion 
of the theta cycle devoted to large values of S is small, 
the firing rate averaged across all phases will be small, 
leading to an increase in activity closer to the reward. 


C. Testable properties of the mathematical model 

Although the model aligns reasonably well with known 
properties of theta phase precession, there are a number 
of features of the model that have, to our knowledge, 
not yet been evaluated. At a coarse level, the corre¬ 
spondence between time and one-dimensional space im¬ 
plies that time cells should exhibit phase precession with 
the same properties as place cells. While phase preces¬ 
sion has been extensively observed and characterized in 
hippocampal place cells, there is much less evidence for 
phase precession in hippocampal time cells (but see [3]). 

According to the model, the pattern of phase preces¬ 


sion is related to the distribution of s values represented 
along the dorsoventral axis. While it is known that a 
range of spatial scales are observed along the dorsoventral 
axis, their actual distribution is not known. The Weber- 
Fechner scale of eq.[^is a strong prediction of the frame¬ 
work developed here. Moreover, since ff>max/^o = Sq/sn^ 
the ratio of the largest to smallest scales represented in 
the hippocampus places constraints on the form of phase 
precession. The larger this ratio, the larger will be the 
value of b in eq. and the curvature in the phase preces¬ 
sion plots (as in fig. will only emerge at larger values 
of the local phase Og. Neurophysiological observation of 
this ratio could help evaluate the model. 

The form of (eq. 16) leads to several distinctive 
features in the pattern of phase precession of the nodes 
representing future prediction. It should be possible to 
observe phase precession for cells that are predicting any 
stimulus, not just a reward. In addition, the model’s as¬ 
sumption that a timeline of future predictions is aligned 
with global theta phase has interesting measurable con¬ 
sequences. Let’s reconsider the thought experiment from 
the previous section (fig. , where a stimulus predicts an 
outcome after a delay Tq. Immediately after the stimulus 
is presented, the value of 6 at which the prediction peaks 
is monotonically related to Tq. Since S is monotonically 
related to the reference phase the prediction cells will 
begin to fire at later phases when Tq is large, and as time 
passes, they will fire at earlier and earlier phases all the 
way untill Oq = 0. In other other words, the entry-phase 
(at which the firing activity begins) should depend on 
To, the prediction timescale. This is illustrated in fig. 
with To = 3 and Tq = 7, superimposed on the same graph 
to make visual comparison easy. The magnitude of the 
peak activity would in general depend on the value of 
To except when re = 2 (as assumed here for visual clar¬ 
ity). Experimentally manipulating the reward times and 
studying the phase precession of prediction cells could 
help test this feature. 


IV. DISCUSSION 

This paper presented a neural hypothesis for imple¬ 
menting translations of temporal and 1-d spatial mem¬ 
ory states so that future events can be quickly antici¬ 
pated without destroying the current state of memory. 
The hypothesis assumes that time cells and place cells 
observed in the hippocampus represent time or posi¬ 
tion as a result of a two-layer architecture that encodes 
and inverts the Laplace transform of external input. It 
also assumes that sequential translations to progressively 
more distant points in the future occur within each cy¬ 
cle of theta oscillations. Neurophysiological constraints 
were imposed as phenomenological rules rather than as 
emerging from a detailed circuit model. Further, impos¬ 
ing scale-invariance and coherence in translation across 
memory nodes resulted in Weber-Fechner spacing for the 
representation of both the past (spacing of Sn in the mem- 
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ory nodes) and the future (the relationship between S and 
6>o). Apart from providing cognitive flexibility in access¬ 
ing a timeline of future predictions at any moment, the 
computational mechanism described qualitative features 
of phase precession in the hippocampus and in the ventral 
striatum. Additionally, we have also pointed out certain 
distinctive features of the model that can be tested with 
existing technology. 

A. Computational Advantages 

The property of the T layer that different nodes rep¬ 
resent the stimulus values from various delays (past mo¬ 
ments) is reminiscent of a shift register (or delay-line or 
synfire chain). However, the two layer network encod¬ 
ing and inverting the Laplace transform of stimulus has 
several significant computational advantages over a shift 
register representation. 

(i) In the current two-layer network, the spacing of s 
values of the nodes can be chosen freely. By choosing ex¬ 
ponentially spaced s-values (Weber-Fechner scaling) as 
in eq. the T layer can represent memory from ex¬ 
ponentially long timescales compared to a shift register 
with equal number of nodes, thus making it extremely 
resource-conserving. Although information from longer 
timescales is more coarse-grained, it turns out that this 
coarse-graining is optimal to represent and predict long- 
range correlated signals [22]. 

(ii) The memory representation of this two layer net¬ 
work is naturally scale-invariant (eq. [^. To construct 
a scale-invariant representation from a shift register, the 
shift register would have to be convolved with a scale- 
invariant coarse-graining function at each moment, which 
would be computationally very expensive. Moreover, it 
turns out that any network that can represent such scale- 
invariant memory can be identified with linear combina¬ 
tions of multiple such two layer networks m 

(hi) Because translation can be trivially performed 
when we have access to the Laplace domain, the two layer 
network enables translations by an amount S without se¬ 
quentially visiting the intermediate states < S. This can 
be done by directly changing the connection strengths 
locally between the two layers as prescribed by diagonal 
R (5 operator for any chosen Consequently the physical 
time taken for the translation can be decoupled from the 
magnitude of translation. One could imagine a shift reg¬ 
ister performing a translation operation by an amount S 
either by shifting the values sequentially from one node to 
the next for S time steps or by establishing non-local con¬ 
nections between far away nodes. The latter would make 
the computation very cumbersome because it would re¬ 
quire every node in the register to be connected to every 


^ In this paper we considered sequential translations of various val¬ 
ues of S, since the aim was to construct an entire future timeline 
rather than to discontinuously jump to a distant future state. 


other node (since this should work for any S), which is 
in stark contrast with the local connectivity required by 
our two layer network to perform any translation. 

Many previous neurobiological models of phase preces¬ 
sion have been proposed na HUSHES], and many as¬ 
sume that sequentially activated place cells firing within a 
theta cycle result from direct connections between those 
cells [34], not unlike a synfire chain. Although taking 
advantage of the Laplace domain in the two layer net¬ 
work to perform translations is not the only possibility, 
it seems to be computationally powerful compared to the 
obvious alternatives. 


B. Translations without theta oscillations 

Although this paper focused on sequential translation 
within a theta cycle, translation may also be accom¬ 
plished via other neurophysiological mechanisms. Sharp 
wave ripple (SRW) events last for about 100 ms and are 
often accompanied by replay events-sequential firing of 
place cells corresponding to locations different from the 
animal’s current location |35H39]. Notably, experimen¬ 
talists have also observed preplay events during SWRs, 
sequential activation of place cells that correspond to tra¬ 
jectories that have never been previously traversed, as 
though the animal is planning a future path IMlIlQ]. Be¬ 
cause untraversed trajectories could not have been used 
to learn and build sequential associations between the 
place cells along the trajectory, the preplay activity could 
potentially be a result of a translation operation on the 
overall spatial memory representation. 

Sometimes during navigation, a place cell correspond¬ 
ing to a distant goal location gets activated [38], as 
though a finite distance translation of the memory state 
has occurred. More interestingly, sometimes a reverse- 
replay is observed in which place cells are activated in 
reverse order spreading back from the present location 
m- This is suggestive of translation into the past (as 
if S was negative), to implement a memory search. In 
parallel, there is behavioral evidence from humans that 
under some circumstances memory retrieval consists of 
a backward scan through a temporal memory represen¬ 
tation [4T]-[43] (although this is not neurally linked with 
SWRs). Mathematically, as long as the appropriate con¬ 
nection strength changes prescribed by the operator 
can be specified, there is no reason translations with neg¬ 
ative S or discontinuous shift in S could not be accom¬ 
plished in this framework. Whether these computational 
mechanisms are reasonable in light of the neurophysiol¬ 
ogy of sharp wave ripples is an open question. 


C. Multi-dimensional translation 

This paper focused on translations along one dimen¬ 
sion. However it would be useful to extend the formal¬ 
ism to multi-dimensional translations. When a rat ma- 
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neuvers through an open field rather than a linear track, 
phase precessing 2-d place cells are observed [44]. Con¬ 
sider the case of an animal approaching a junction along 
a maze where it has to either turn left or right. Phase 
precessing cells in the hippocampus indeed predict the 
direction the animal will choose in the future [45]. In 
order to generalize the formalism to 2-d translation, the 
nodes in the network model must not be indexed only 
by s, which codes their distance from a landmark, but 
also by the 2-d orientation along which distance is calcu¬ 
lated. The translation operation must then specify not 
just the distance, but also the instantaneous direction as 
a function of the theta phase. Moreover, if translations 
could be performed on multiple non-overlapping trajecto¬ 
ries simultaneously, multiple paths could be searched in 
parallel, which would be very useful for efficient decision 
making. 


D. Neural representation of predictions 


The computational function of ps (eq. 16) is to rep¬ 
resent an ordered set of events predicted to occur in the 


future. Although we focused on ventral striatum here 
because of the availability of phase precession data from 
that structure, it is probable that many brain regions rep¬ 
resent future events as part of a circuit involving frontal 
cortex and basal ganglia, as well as the hippocampus and 
striatum [46^52] . There is evidence that theta-like oscil¬ 
lations coordinates the activity in many of these brain re¬ 
gions [531456] . For instance, 4 Hz oscillations show phase 
coherence between the hippocampus, prefrontal cortex 
and ventral tegmental area (VTA), a region that signals 
the presence of unexpected rewards [56]. A great deal of 
experimental work has focused on the brain’s response to 
future rewards, and indeed the phase-precessing cells in 
fig-in appear to be predicting the location of the future 
reward. The model suggests that ps should predict any 
future event, not just a reward. Indeed, neurons that ap¬ 
pear to code for predicted stimuli have been observed in 
the primate inferotemporal cortex [57] and prefrontal cor¬ 
tex [58]. Moreover, theta phase coherence between pre¬ 
frontal cortex and hippocampus are essential for learning 
the temporal relationships between stimuli [59]. So, fu¬ 
ture predictions could be widely distributed throughout 
the brain. 
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